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2 Abstract: The semiparametric accelerated failure time model is not as widely used as the Cox relative 



<N 

3 risk model mainly due to computational difficulties. Recent developments in least squares estimation 

4 and induced smoothing estimating equations provide promising tools to make the accelerate failure time 
^ 5 models more attractive in practice. For semiparametric multivariate accelerated failure time models, 
^ ^ 6 we propose a generalized estimating equation approach to account for the multivariate dependence 

7 through working correlation structures. The marginal error distributions can be either identical as in 

8 sequential event settings or different as in parallel event settings. Some regression coefficients can be 
shared across margins as needed. The initial estimator is a rank-based estimator with Gehan's weight, 

10 but obtained from an induced smoothing approach with computation ease. The resulting estimator 

11 is consistent and asymptotically normal, with a variance estimated through a multiplier resampling 

12 method. In a simulation study, our estimator was up to three times as efficient as the initial estimator, 

13 especially with stronger multivariate dependence and heavier censoring percentage. Two real examples 

14 demonstrate the utility of the proposed method. 

C/3 15 Key words and phrases: efficiency; induced smoothing; least squares; multivariate survival. 

> 16 1 Introduction 
in 

00 

^ 17 Multivariate failure times are frequently encountered in biomedical research where failure times are 

O 18 clustered. For example, a diabetic retinopathy study assessed the efficacy of a laser treatment on 

19 decelerating vision loss, measured by time to blindness in the left eye and in the right eye from the 

20 same patient with diabetes (Diabetic Retinopathy Study Research Group, 1976); a colon cancer 

21 study evaluated the treatment effects on prolonging the time to tumor recurrence and time to death 
^ 22 (Lin, 1994). The failure times within the same cluster are associated. Even though the primary 

23 interest most often lies in the marginal effects of covariates on the failure times, accounting for 

24 the within-clustcr dependence may lead to more efficient regression coefficient estimators. For non- 
^ 25 censored multivariate data, the generalized estimating equations (GEE) approach (Liang and Zeger, 

26 1986) has become an important piece in statisticians' toolbox for marginal regression. For censored 

27 multivariate failure times, the marginal accelerated failure time (AFT) model is a counterpart of the 

28 marginal model. This paper aims to develop a GEE approach to make inferences for multivariate 

29 AFT models, taking advantage of recent developments on AFT models with least squares and 

30 induced smoothing. 

31 A semiparametric AFT model is a linear model for the logarithm of the failure times with 

32 error distribution unspecified. A nice interpretation is that the effect of a covariatc is to multiply 

33 the predicted failure time by some constant. It provides an attractive alternative to the popular 

34 relative risk model (Cox, 1972). Three main classes of estimator exist for univariate AFT models. 

35 The Buckley- James (BJ) estimator extends the least squares principle to accommodate censor- 

36 ing through an expectation-maximization (EM) algorithm which iterates between imputing the 
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censored failure times and least squares estimation (Buckley and James, 1979). Despite the nice 

asymptotic properties (Lai and Ying, 1991; Ritov, 1990), the BJ estimator may be hard to get 
as the EM algorithm may not converge. Further, the limiting covariance matrix is difficult to 
estimate because it involves the unknown hazard function of the error term. The second class is 
the rank-based estimator motivated by inverting the weighted log-rank test (Prentice, 1978). Its 
asymptotic properties has been rigorously studied by Tsiatis (1990) and Ying (1993). Due to lack 
of efficient and reliable computing algorithm, the rank-based estimator has not been widely used in 
practice until recently, with numerical strategies for drawing inference developed by Huang (2002) 
and Strawderman (2005). The third class is obtained by minimizing an inverse probability of cen- 
soring weighed (IPCW) loss function (Robins and Rotnitzky, 1992). The IPCW estimator is easy 
to compute, consistent and asymptotically normal (Stute, 1993, 1996; Zhou, 1992), but it requires 
correct specification of the conditional censoring distribution and overlapping of the supports of 
the censoring time and the failure time. 

More recent works have led to a promising perspective on bringing AFT models into routine 
data analysis practice. For rank-based inference, Jin et al. (2003) proposed a linear programming 
approach, exploiting the fact that the weighted rank estimating equation is the gradient of an ob- 
jective function which can be readily solved by linear programming. Variances of the estimators are 
obtained from a resampling method. A computationally more efficient approach for rank-based in- 
ference with Gehan's weight (Gehan, 1965) is the induced smoothing procedure of Brown and Wang 
(2007). This approach is an application of the general induced smoothing method of Brown and 
Wang (2005), where the discontinuous estimating equations are replaced with a smoothed version, 
whose solutions are asymptotically equivalent to those of the former. The smoothed estimating 
equations are differentiable, which facilitates rapid numerical solution and sandwich variance esti- 
mator. Jin et al. (2006a) suggested an iterative least-squared procedure that starts from a consistent 
and asymptotically normal initial estimator such as the one obtained from the rank-based method 
of Jin et al. (2003) . The resulting estimator is consistent and asymptotically normal, with variance 
estimated from a multiplier resampling approach. 

For multivariate AFT models, Jin et al. (2006b) developed rank-based estimating equations that 
are solved via linear programming for marginal regression parameters. Johnson and Strawderman 
(2009) extended the induced smoothing approach for a rank-based estimator with Gehan's weight 
to the case of clustered failure times and showed that the smoothed estimates perform as well as 
those from the best competing methods at a fraction of the computational cost. Jin et al. (2006a) 
considered their least squares method with marginal models for multivariate failure times. All 
these approaches used independent working model and left the within-cluster dependence structure 
unspecified. Li and Yin (2009) developed a generalized method of moments approach for rank-based 
estimator using the quadratic inference function approach (Qu et al., 2000) to incorporate within- 
cluster dependence. Wang and Fu (2011) incorporated within-cluster ranks for the Gehan type 
estimator with the aid of induced smoothing. To the best of our knowledge, little work has been done 
to extend the GEE approach to the setting of multivariate AFT models except a technical report 
(Hornsteiner and Hamerle, 1996), where the BJ estimator was combined with GEE. Nevertheless, 
having no access to recent advances on AFT models, they did not solve the convergence problems, 
and their asymptotic variance estimator formula could not be easily computed because it depends 
on the derivatives of imputed failure times with respect to regression parameters, which might 
explain their overestimation of the variance. 

We propose an iterative GEE procedure to account for multivariate dependence through a work- 
ing covariance or weight matrix. This method has the same spirit as GEE in that misspecification 
of the working covariance matrix does not affect the consistency of the parameter estimator in the 
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1 marginal AFT models; when the working covariance is close to the unknown truth, the estimator 

2 has higher efficiency than that from working independence as used in Jin et al. (2006a). Our initial 

3 estimator is the computationally efficient, rank-based estimator from Johnson and Strawderman 

4 (2009), whose consistency and asymptotic normality is inherited by the resulting GEE estimator. 

5 We develop methods for cases where all marginal distributions are identical and for cases where at 

6 least two margins are different. Regression coefficients can be the same or partially the same across 

7 margins as needed. 

8 The rest of the article is organized as follows. The semiparametric multivariate accelerated 

9 failure time model and the notation are introduced in Section 2. In Section 3, we propose an 

10 iterative GEE procedure to update a consistent and asymptotically normal initial estimator and 

11 present asymptotic properties of our estimator. A large scale simulation study is reported in 

12 Section 4 to assess the properties of the proposed estimator. The proposed methods are illustrated 

13 with the two aforementioned real applications in Section 5. In particular, some new findings are 

14 reported in analyzing the diabetic retinopathy study. A discussion concludes in Section 6. The 

15 sketch of proofs are relegated to the appendix. 

16 2 Multivariate Accelerated Failure Time Model 

17 There are two types of multivariate failure times depending on whether the multiple events are 

18 parallel or sequential. The difference between the two types is that the dimension is fixed for 

19 parallel data while random for sequential data. In a regression model, we generally have different 

20 covariates and different coefficients at each margin for parallel data. For sequential data, however, 

21 some or all covariates and covariate coefficients may be the same across margins. In general, it is 

22 desirable to allow some of the regression coefficients to be shared across margins as needed. We 

23 develop the methodology for parallel data for notational simplicity but comment when appropriate 

24 on how to adapt to sequential data. 

25 Consider a random sample formed by n clusters. For parallel data, all clusters are of size K 

26 while for sequential data, cluster i may have size Ki. For ease of notation, assume at the moment 

27 that the cluster sizes are all equal to K. For i = 1, • • • ,n and k = 1, - ■ ■ ,K, let Tik and Cik be, 

28 respectively, the log-transformed failure time and censoring time for margin k in cluster i. Let 

29 Yik = mm{Tik, Cik) and Aj^ = /(Tj^ < Qk). We stack Yik, Tik, Qk, and Aj^, k = l,...,K,to form 

30 K xl vector 1^, Tj, Cj, and A,, respectively. Let Xi = {Xn, . . . , Xix)^ he a K xp covariate matrix, 

31 with the A;th row denoted by Xj^. The observed data are independent and identically distributed 

32 copies of {Y,A,X}: {{Yi,Ai,Xi) : i = l,...,n}. We assume that Ti and Q are conditionally 

33 independent given Xj. 

34 Our multivariate accelerated failure time model is 

Ti = XiP + ei, (1) 

35 where /3 is a p x 1 vector of regression coefficients, and = {en, . . . , ej^)^ is a random error vec- 

36 tor with an unspecified multivariate distribution. This formulation accommodates margin-specific 

37 regression coefficients, in which case, /3 is a stack of all marginal coefficients, and Xi is a block 

38 diagonal matrix. The error vectors e^'s, i = 1, . . . ,n, are independent and identically distributed. 

39 For parallel data, the K marginal distributions can be all different, while for sequential data, the 

40 number of unique marginal distributions may be smaller or even one as in a recurrent event setting. 
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1 With right censoring, Buckley and James (1979) replaced each response Tjjfc with its condi- 

2 tional expectation Yik{(3) = Eg{Tik\Yik, Au^, Xi^), where the expectation is evaluated at regression 

3 coefficients p. Let Yi{l3) = {Ya{/3), . . . Jin et al. (2006a) defined 



UniP,b)=^{Xi 

1=1 



Xy (Yi{b)-XiP) =0, 



(2) 



4 where X = Y17=i -^i/''^^ ^^d b is an initial estimator of The solution for UniP, f3) is the Buckley- 

5 James estimator. The advantage for fixing the initial value b is to avoid solving for Un{(3, j3) which 

6 is neither continuous nor monotone in /3. Let the Ln{b) be the solution for Un{P,b) 

7 Then L„(6) has a closed-form, 



given b. 



Ln{b) = 



n 



- x)-^ {X, - X) 



.1=1 



-1 



n 

- xy (y,(6) - Y{b) 

.1=1 



(3) 



8 where Y{b) = ^i(b)/?^- Equation (3) leads to an iterative algorithm: = Ln{P^ ^^)., 

9 m > 1. If the initial estimator b is consistent and asymptotically normal, is consistent and 

10 asymptotically normal for every m. 

11 Although this estimator is consistent, its efficiency might be low because it completely ignores 

12 the within-cluster dependence. We next propose to accommodate dependence using the GEE ap- 
is proach, which covers the estimator of Jin et al. (2006a) as a special case with working independence. 



14 3 Inference with GEE 

15 For a given initial estimator b of (5, we propose an updated estimator by solving the GEE 

n 

UniP, b, a) = ^(x, - x)~'n;' {a{b)) (y,(6) - x.p) = 0, (4) 
1=1 

16 where X = ^^^i Xi/n, and fl^^ {o({b)) is a K x K nonsingular working weight matrix which may 

17 involve additional working parameters a, which may depend on b. For given a and b, the solution 

18 of the GEEs (4) has a closed-form 



Ln{b,a) = 



J2{Xi-xyn-\aib)){Xi-x) 



n 

Y.iXi-Xfn-^aib)) {Yi{b)-Y{b) 
1=1 



(5) 



19 This process can be carried out iteratively, summarized as follows. 

20 1. Obtain an initial estimate jin^ = bn of /3 and initialize with m = 1 

21 2. Obtain an estimate of a given P^~^\ a.n{P'!^~^) ■ 



22 3. Update with = L„(/3n" ^^an)• 



^(m-l) 



23 4. Increase m by one and repeat 2 and 3 until convergence. 
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As in Jin et al. (2006a), a consistent and asymptotically normal estimator is important for 

avoiding convergence problems. We propose to use the rank-based estimator with Gehan's weight 
from the induced smoothing approach of Johnson and Strawderman (2009). This estimator has the 
same asymptotic property as the non-smoothed version in Jin et al. (2003), but can be obtained 
with computation ease; its finite sample performance was also reported to be as well as the best 
competing methods (Johnson and Strawderman, 2009). 

The GEEs are most efficient when Qi is chosen to be the covariance matrix of Yi{b). When f^j's 
are the identity matrix (working independence with all marginal variances the same) , our estimator 
reduces to the least squares estimator of Jin et al. (2006a). The working covariance matrix Jlj's are 
the same when all clusters have the same size K; they only vary with i when the cluster sizes are 
not equal. 

For convenience, we assume from now on that E{eik) = 0, i = 1, . . . k = 1, . . . This 
can be achieved by incorporating appropriate columns of ones in Xj, and, hence, adding intercepts 
in (3. Our construction of working covariance involves filling element for k^l E. {1, • • • , -RT}, of 
the working covariance matrix VL. To allow arbitrary number of unique marginal distributions, let 
vrik G {!,...,«;} be the index of the kth margin among the k unique marginal distributions. The 
conditional expectation Yik{h) is computed as 



Yik{b) = AikYik + (1 - A^it) 



^ + Xji^b 

1 - Fk,b{eik{b)} 



where eik{b) = Y^k — Xj^b is the right-censored error evaluated at 6, and Fk^b is the pooled Kaplan- 
Meier estimator of the distribution function Fk^b from the transformed data {eir{b), Aj^ : rur = mfe}, 
which share the same margin m^. Specifically, Fkb is 



Fk,b{t) = 1 - n 1 - X-n ^ 



I {eji{b) > eir{b)) 



To fill the diagonal elements ^kk^ ^ ^ k < K, evaluate the conditional second moment of eik{b) 
given the observed data: 

f°°/,N M^dFfe 5(m) 

Vik{b) = A,keUb) + {l-A,k) ^ '^^ . i = l,...,n, k = l,...,K. (6) 

1 - Fk,b{eik{b)} 

For a given b, we fill Ukk by an unbiased estimator of Var(eifc(6)) 

A 12l<i<n,l<r<K:mr=nn. ^ik{b) 

^hk{b) = :j-r ^ • (7) 

ni^l<r<K niT^r = mk} 

To fill the off-diagonal elements rijt/, k ^ I, define 

eik{b) = Yik{b)-Xlb, i = l,...,n, k = l,...,K, (8) 

the conditional expectation of eik{b) given the observed data. Only when Aj^ = 1 is eik{b) equal to 
eik{b). For a given b, we fill Q^kh k ^ l,hy 

^^feK^') = -Ee*fcWei/(6). (9) 

1=1 
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Because the construction of ejfe(6) does not involve the dependence between pair {k,l) in cluster i, 
eik{b)eii{b) does not have expectation Cov(e,jfc(6), unless Ajj. = Aj; = 1. Nevertheless, i^kl{b) 

is still usable for its simplicity in constructing working covariance. 

Parsimonious working covariance structures such as exchangeable (EX) or autoregressive with 
order 1 (ARl) can be imposed. Parameters a in the working covariance can be estimated with 
method of moment estimator a„ based on Q, as in the non-censored case (Liang and Zeger, 1986). 
When there is no censoring, the working covariance matrix Cl converges to the true covariance 
matrix. This is no longer true when censoring is present. Nevertheless, (l, and consequently, 
still converges to some limit which helps to improve the efficiency of the GEE estimation. 

Extension to unequal cluster sizes as in a recurrent event setting is straightforward. In this 
case, it is reasonable to assume identical marginal error distributions, hence, identical marginal 
variances. The working covariance matrix with dimension Ki x Ki can be constructed with an 
given estimator for a for a specified working covariance structure. 

Under certain regularity conditions, the proposed estimator is consistent to the true regression 
coefficients /3o and asymptotically normal. The asymptotic results are summarized in the following 
theorems, whose proofs are sketched in the Appendix. 

Theorem 1. Under conditions A1-A9 in the Appendix, is a consistent estimator of the true 
parameter Pq for each m> 1. 

Theorem 2. Under conditions A1-A9 in the Appendix, n^/'^0^^ — /3o) converges in distribution 
to multivariate normal with mean zero for each m>l. 



The resampling approach developed by Jin et al. (2006a) is adapted to estimate the covariance 
matrix of Let Zi, i = • ,n, be independent and identically distributed positive random 

variables, independent of the observed data, with E{Zi) = Var(Zi) = 1. Define 



Y*f^{b) = AikYik + (1 - Aik) 



k,b\ 



+ X7kb 



where 



FU(t) = 1 - 



n 



1 - 



ZiAr 



l<i<n,l<r<K:mr=mk,eir<t \ '^J=^ ^l<l<K:mi=mk 

Then the multiplier resampling version of equation (5) has the following form. 



Zil{eji{b) > eir{b)) j ' 



L*nib,a) 



Zi{Xi - X)^ri (a(6)) {Xi - X) 



.i=l 



n 

^ Z,{X, - X)^ri(a(6)) {y;(6) - Y*{b)} 

.i=l 



where a(b) is an estimator of working correlation parameter given regression coefficients evaluated 
at b and Y*{b) = YIU ^i*(&)/^- 

For a realization of (Zi, . . . , Z„) and an initial estimator \ a bootstrap estimator of /3 is 
obtained from iteration P^^* = L* (/3^"^~^^*). The covariance matrix of ^n"*^ can be estimated 
from the sample covariance matrix of a bootstrap sample of /^i™^*. The consistency of this variance 
estimator can be proved following arguments similar to those in Jin et al. (2006a). 
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4 Simulation Study 

We conducted two simulation studies to assess the performance of proposed estimators and com- 
pared its efficiency with the initial estimators from Johnson and Strawderman (2009). The first 
study had a clustered failure time setting with identical regression coefficients across margins and 
identical marginal error distributions. The cluster sizes were fixed at three. For cluster i, the 
multivariate failure time Tj = (Tji, Tj2, Tjs) was generated from 

logTjjt = 2 + Xiik + X2ik + €ik, 

where was Bernoulli with rate 0.5, X2ik was A?^(0,0.5^), and = {en, ei2, Cis) was a trivariate 
random vector specified by identical marginal error distributions and a copula for the dependence 
structure. Three marginal error distributions were considered: standard normal, standard logistic, 
and standard Gumbel, abbreviated by N, L, and G, respectively; the tail of the three distributions 
gets heavier from N to L to G. The dependence structure was specified by a Clayton copula 
with three levels of dependence measured by Kendall's tau: 0, 0.3, and 0.6. Censoring times were 
independently generated from uniform distributions over (0, c), where c was selected for each margin 
to achieve three levels of censoring percentage: 0%, 25%, and 50%. Wc considered random samples 
of size n = 200 clusters. Rank-based estimator with Gehan's weight from the induced smoothing 
approach of Johnson and Strawderman (2009), denoted by JS, was used as the initial estimator 
for GEE estimators. Two working covariance structures, EX and ARl, were used for the proposed 
iterative GEE procedure. The covariance matrix of the estimator was obtained from the resampling 
approach with 200 bootstrap size in Section 3. For each configuration, we did 1000 replicates. 

The results are summarized in Table 1. To save space, only results for nonzero Kendall's 
tau were reported. All estimators appear to be virtually unbiased. The empirical variation of 
the estimates and the estimated variation based on the resampling procedure agree closely for all 
estimators. For a given censoring percentage, as the dependence level increases, the variance of the 
JS estimator changes little, but the variance of the GEE estimators with both working covariance 
structures decreases. Further, the variance from the EX structure is in general smaller than that 
from the ARl structure, which is expected because the true covariance structure is exchangeable 
in this simulation setting. For a fixed dependence level, the effect of censoring percentage on the 
variances of the estimator depends on the marginal error distributions. The variance increases 
clearly as the censoring gets heavier when the errors are normally distributed, but this pattern is 
not observed with Gumbel or logistic marginal error distributions. The relative efficiency of the 
proposed GEE estimator in relative to the rank-based JS estimator is up to 3.5 in the table (with 
logistic margin and Kendall's tau 0.6 for /?2). 

The second simulation setting had multiple event data with different regression coefficients and 
different marginal error distributions. The cluster sizes were still fixed at three. For cluster i, the 
multivariate failure times were generated from 

logTjfe = Pok + PlkXuk + (32k^2ik + ^ik, 

where [Ptdk-, hk^ hk)-> k = 1,2,3, was the regression coefficient vector for margin k, and ej = 
(cii) ei2, Cis) was a trivariate random vector specified by three marginal distributions and a copula 
for dependence. The marginal distributions of were standard normal, standard logistic, and 

standard Gumbel, respective^, for the first, second and third margin; their copula was Clayton 
with three dependence levels measured by Kendall's tau: 0, 0.3, and 0.6. The regression coefficients 
{Pok, Pikj P2k) were set to be (—1,1,-1), (1,-1,1), and (1,1,1), respectively for k = 1, 2, and 3. 
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Table 1: Summary of simulation results with identical regression coefficients and identical 
marginal error distributions based on 1000 replications. Empirical SE is the standard de- 
viation of the parameter estimates; Estimated SE is the mean of the standard error of the 
estimator; RE is the empirical relative efficiencies in relative to the JS estimator. 
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Other settings such as the covariates, censoring time, sample size, initial estimator, bootstrap 
sample size for variance estimation, replication size were all the same as in the first simulation 
setting. In addition to the JS estimator, GEE estimators with two working covariance structures 
were considered: EX and unstructured (UN). 

The results are summarized in Tables 2. Similar to the first simulation study, all estimators 
are virtually unbiased, and their variance estimators are generally close to the empirical variances 
of the replicates. The variance of the GEE estimators decreases as the dependence gets stronger 
at any level of censoring percentage. Holding the dependence level, as the censoring percentage 
increases, the variance increases at the normal margin, but the pattern is different for the other 
two margins. The variance has little changes at the logistic margin. At the Gumbel margin, it 
remains its level as the censoring percentage increases from to 25%, but increases notably as the 
censoring percentage increases from 25% to 50%. There is almost no difference between the two 
working covariance structures, both leading to about the same relative efficiency compared to the 
rank-based JS estimator. The relative efficiency of both GEE estimators almost double as Kendall's 
tau is increased from 0.3 to 0.6. 

5 Application 

The diabetic retinopathy study (DRS) was started in 1971 (Diabetic Retinopathy Study Research 
Group, 1976) with the aim to investigate the efficacy of laser photocoagulation in delaying the 
onset of severe vision loss. Diabetic retinopathy is the most common and serious eye complication 
of diabetes, which may lead to poor vision or even blindness. A subset of the DRS data for patients 
with "high-risk" diabetic retinopathy, categorized by risk group 6 or higher, has been analyzed by 
many authors (e.g., Huster et al., 1989; Lee and Wei, 1993; Liang et al., 1993; Spiekerman and Lin, 
1996). Each of the 197 patients in this subset had one eye randomized to laser treatment and the 
other eye received no treatment. The outcomes of interest were the actual times from initiation of 
treatment to the time when visual acuity dropped below 5/200 at two visits in a row (defined as 
"blindness"). The scientific interest was the effectiveness of the laser treatment and the infiuence 
of other risk factors. In addition to the treatment indicator, three covariates are available: age at 
diagnosis of diabetes, type of diabetes (1 = adult, = juvenile), and risk group (6 to 12, rescaled to 
0.5 to 1.0). Since the interaction between treatment and diabetes type was found to be significant 
in Spiekerman and Lin (1996), we also include this interaction in the model. 

We first fit a bivariate AFT model with identical error margins and identical regression coeffi- 
cients for both left and right eyes. The second AFT model we fit was the opposite, with different 
error margins and different regression coefficients for left and right eyes. For each model, we report 
GEE estimators with working independence and working exchangeable covariance structures, in 
addition to the rank-based JS estimator in Table 3. The GEE estimator with exchangeable work- 
ing structure from the first model suggests that the treatment was significant in delaying the onset 
of vision loss; it had a significant higher effect for adult than for juvenile, and patients in higher risk 
groups tended to lose vision sooner. Note that the treatment effect was not significant if working 
independence were used in the GEE estimator. The second model offered a possibility to check 
whether the marginal error distributions and regression coefficients should indeed be identical as 
assTimcd in the first model. Figure 1 shows the the Kaplan-Meier survival curves of the censored 
residuals for the left margin and right margin respectively, overlaid with the pooled estimate from 
the first model. All three curves appear to be mingled together tightly. A naive log-rank test to 
compare the two margins, ignoring that the regression coefficients were not known but estimated, 
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Table 2: Summary of simulation results with different regression coefficients and different 
marginal error distributions based on 1000 replications. Empirical SE is the standard de- 
viation of the parameter estimates; Estimated SE is the mean of the standard error of the 
estimator; RE is the empirical relative efficiencies in relative to the JS estimator. 
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Table 3: Results of analyzing Diabetic Retinopathy Study. 



JS IND EX 



Margin 


Effects 


EST 


SE 


EST 


SE 


EST 




SE 


Identical 


error margins 


and identical rej 


jression coefficients: 








pooled 


risk group 


-2.659 


0.739 


-2.408 


0.859 


-2.306 


0. 


,775 




age 


-0.010 


0.012 


-0.010 


0.013 


-0.010 


0. 


,014 




diabetes 


-0.140 


0.349 


-0.065 


0.440 


-0.065 


0. 


,369 




treatment 


0.520 


0.197 
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0.330 


0.542 


0. 


,263 




interaction 


1.116 


0.301 


0.961 


0.466 


0.964 


0. 


,410 


Different 


error margins 


and different regression coefficients: 








left 


risk group 


-2.819 


1.114 


-2.832 


1.195 


-2.654 


1. 


,242 




age 


-0.042 


0.016 


-0.037 


0.019 


-0.036 


0. 
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0.554 


0.702 


0, 


,544 




treatment 
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0.422 
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0.549 
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0. 
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interaction 


1.719 


0.650 


1.742 


0.855 


1.739 


0, 


,820 


right 


risk group 


-2.087 


1.013 


-1.944 
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-1.805 


1. 
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0.014 
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0.016 
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0. 
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-0.770 
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-0.640 
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-0.639 


0, 
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treatment 
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0.381 


0.477 


0. 


,446 




interaction 


0.752 


0.476 


0.600 


0.639 


0.603 


0. 


,646 


Identical 


error margins 


with partial common regression coefficients: 






left 


age 


-0.039 


0.015 


-0.036 


0.021 


-0.036 


0. 


,022 




diabetes 


0.892 


0.406 


0.848 


0.607 


0.846 


0. 


,621 
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age 


0.011 


0.015 


0.009 


0.019 


0.009 


0. 


,017 
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-0.870 
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-0.837 


0.499 


-0.835 


0. 


,574 
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treatment 
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0.227 


0.606 


0.250 


0.607 


0. 


,267 




risk group 


-2.588 


0.747 


-2.409 


1.034 


-2.264 


0. 


,938 




interaction 


1.067 


0.318 


1.014 


0.344 


1.014 


0. 


,409 
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Figure 1: Kaplan-Meier survival curves for censored residuals of the two applications. Left: 
the DRS Study. Right: the colon cancer study. 



yielded a p-value of 0.907, confirming the visual observation. Our joint model also allows hypoth- 
esis testing of equal coefficients for each covariate across the two margins with Wald-type tests. 
The coefficients of treatment, risk group, and treatment-diabetes interaction were found to be not 
significantly different across the two margins, with p- values 0.400, 0.278, and 0.147, respectively. 
The coefficients of age and diabetes were found to be significantly different across the two margins, 
with p-values 0.036 and 0.042, respectively. 

We then fit an bivariate AFT model with identical error margins, same coefficients for treatment, 
risk group and treatment-diabetes interaction, and different coefficients for age and diabetes. This 
is one of the many models with intermediate complexity between the first model and the second 
model. Results are summarized in the last section of Table 3. This time, the shared coefficients 
of treatment, risk group, and treatment-diabetes interaction remained significant as before. An 
interesting finding is that the difference between the coefficient of diabetes (0.846 versus —0.835) 
is significantly nonzero with a p-value 0.002, suggesting that the adult diabetes have sooner onset 
of vision loss in right eye than in left eye. This finding has not been reported in existing analyses. 

The second application is a colon cancer study (Lin, 1994). Through randomization, 315, 
310 and 304 patients with stage C colon cancer received observation, levamisole alone (Lev), and 
levamisole combined with fluorouracil (Lev -|- 5FU), respectively. Lin (1994) considered bivariate 
models for the time to first recurrence and the time to death. The research interest was the 
effectiveness of the treatment in prolonging the time to recurrence and time to death. Gender and 
age are available as covariates besides treatment. 

In this application, the error distributions and regression coefficients have no reason to be 
identical across margins. We report results with different error margin and different regression 
coefficients in Table 4. Since all covariates are at the cluster level, the exchangeable and independent 
working covariance structure give the same results (e.g., Hin et al., 2007). The Kaplan-Meier 
survival curves for the two error margins are shown in Figure 1, which clearly exhibits no similarity; 
a naive log-rank test gives p-value 0.0008. The treatment of levamisole combined with fluorouracil 
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Table 4: Result of analyzing Colon Cancer Study 



Margin 


Effects 


Jo 








EST 


SE 


EST 


SE 


recurrence 


Lev 


0.010 


0.124 


0.012 


0.173 




Lev + 5FU 


0.940 


0.138 


0.931 


0.185 




gender 


0.310 


0.111 


0.274 


0.161 




age 


0.011 


0.004 


0.012 


0.006 


death 


Lev 


-0.009 


0.104 


-0.038 


0.131 




Lev + 5FU 


0.458 


0.108 


0.307 


0.136 




gender 


0.064 


0.090 


0.066 


0.111 




age 


-0.003 


0.004 


-0.004 


0.004 



appears to have a significant positive effect on both event times. The gender and age are found not 
to be significant for either time. The estimated difference between the combined treatment effect 
on recurrence and on death (0.931 versus 0.307) has a standard error 0.103, suggesting that the 
combined treatment has a higher effect on recurrence than on death. 

6 Discussion 

The working covariance structure of the proposed GEE approach is different from that in a gen- 
eralized linear model setting, where the variance is assumed to be a function of the mean. The 
errors at each margin are assumed to be independent and identically distributed, and hence have 
the same variance. This assumption may be relaxed by imposing a structure on the variance of 
the errors. For instance, in model (1), we replace ej^ with diki'ik, where fj^'s are independent and 
identically distributed for i = 1, . . . , n with mean zero and variance one, and the scale cTj^ may be 
described by a regression model with covariates. Such specification leads to heteroskedasticity in 
errors and merits further investigation. 

For applications like the DRS study, where there are reasons to impose identical distribution 
across margins, a rigorous test to compare the survival curves of the residuals would be desirable. 
We used naive tests ignoring the fact that the residuals were calculated based on estimated regres- 
sion coefficients. A rigorous test procedure should take into account of the variation caused by the 
estimation procedure. 

A Sketch of the Proofs 

We impose the following regularity conditions: 

Al: < B for all i = 1, • • • ,n and some nonrandom constant B, where || • || is matrix norm. 

A2: The density function of F^^/j exists such that t^dFk^p{t) < oo, for A; = 1, • • • ,K. 
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A3: The distribution function F^^^ is twice differentiable with density such that 

where 1 < A; < and both fk,0{t) and f^p{t) are bounded functions. 
A4: £^[exp(0e^)] + sup^jgjj^ ... -E[exp(6'C,^)] < oo for some > 0, where a" = \a\I{a<o}- 
A5: sup|6|<^._^<t<^ E^=l Ef=i Pr(i < Cik -Xj^h<t + h) = 0{nh) as h ^ and nh ^ oo. 
A6: As n — > oo, is bounded and is n^/^ consistent to ao given p. 

A7: As n — )■ oo, initial estimator bn is n"^/^ consistent to /3o and \/n(^n — /3o) is asymptoticly normal 
with zero mean. 

A8: The slope matrices n~^dUn/dl3 and n~^dUn/db evaluated at (/3o, /3o, "o) converge to nonde- 
generate, finite limit A and B, respectively. 

A9: The derivative d^}^^{a) /da is finite for alH = 1, 2, . . . n. 

Conditions A1-A5 are standard and ensure the existence of the solution of equation (2) (Lai 
and Ying, 1991). It is natural to assume that the working covariance matrix Q in equation (4) 
is a symmetric positive definite matrix. Then there exist a K x K nonsingular matrix, F, such 
that n{ao) = rV2rV2. Let = T'^/^Xi, = T'^^^Yi, Q = T~^/^Ci, and = T-^/^a. Then 
equation (4) evaluated at a = ao can be viewed as equation (2) with the transformed data Xj and 
Yj = min(Yi, C^), with error Wj, i = 1, . . . , n. The existence of the solution to equation (4) can be 
verified by the same arguments as in Lai and Ying (1991), with assumptions similar to Al to A5 
on the transformed data. The consistency and asymptotic normality of the estimator given a = ao 
follow from the same arguments as in Jin et al. (2006a). 

The extra complexity here comes from the fact that equation (4) is solved at a = q;„, an 
estimator of ao- Under condition A9, the ith term in the summation of dUn/da evaluated at 
(^0) Po, ao) is a linear function of Yi{Po) — Xj Pq, i = 1, . . . ,n, with expectation zero. By the law 
of large number, n~^dUn/da evaluated at {l3o,Po,ao) converges to zero in probability. 



A.l Proof of Theorem 1 

At the solution given 6„ and a„, we have n~^Un0n\bn,an) = 0. Taylor expansion at 
{Po,Po,ao) gives 

= -Un{(3o,f3o, «o) + - ^ [UniPo, Po, ao)] - Po) 

+ - ^ [C/n(/3o,/3o,ao)] {hn - ^o) + - ^ [C/„(/3o,/3o,ao)] (a„ - ao) + Op{n-^l'^) 

= ^UniPo, Po, ao) + ^„(/§(i) - Po) + Bnibn " Po) + Cn{an - "o) + Op{n-^/^). (10) 

With regularity conditions A1-A5, the first term converges in probability to zero by the law of 
large number. The convergence of 6„ and a„ in A6 and A7, combined with the limit condition in 
A8 and A9, then gives consistency of Pn to Po- By induction, Pn is consistent for Po at every m. 
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1 A.2 Proof of Theorem 2 

2 Under regularity conditions \/n{Pn^ — /3o) can be expressed as 

1 



Un{Po, Po, ao) + BnVn{bn - Po) + CnVn{&n - ao) 



+ Op{l). (11) 



3 With condition A9, C„ converges to zero in probability, and, hence, with ^/n consistency of an, 

4 Cn\/n{6in — ao) = Op(l). Equation (11) is then asymptotically equivalent to 



n 



Un{(3o,f3o, ao) + BnVnibn - (3o 



5 With the assumption that bn — /3o is asymptoticly normal, there exist some nonrandom functions 

6 iji with zero mean such that, 



i=l 

7 On the other hand, Un{/3o, (3o,ao) is a sum of independent and identically distributed quantities 

8 with zero mean, denoted by ^^'s, i = 1, . . . , n. Equation (11) reduces to 



V^iPi'^ - Po) = [An]-' 



n 



-1/2 



i=l 



+ Op{\\bn - Pq\ 



9 By multivariate central limit theorem for sums of independent random vectors, the asymptotic 

10 distribution for is zero mean multivariate normal as n — )■ oo. The limit covariance matrix E 

11 have the form A~^^A~^, where $ = lim„_^oo '^"^ X^"=i 'w^i^h H = + Brji. Induction then 

12 implies that ^n'"^ is multivariate normal for every m. 
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